Auditing SNOMED CT hierarchical relations based on lexical features of concepts in non-lattice subgraphs

نویسندگان

  • Licong Cui
  • Olivier Bodenreider
  • Jay Shi
  • Guo-Qiang Zhang
چکیده

OBJECTIVE We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations. METHODS Our approach involves 3 stages. In stage 1, all non-lattice subgraphs of SNOMED CT's IS-A hierarchical relations are extracted. In stage 2, lexical attributes of fully-specified concept names in such non-lattice subgraphs are extracted. For each concept in a non-lattice subgraph, we enrich its set of attributes with attributes from its ancestor concepts within the non-lattice subgraph. In stage 3, subset inclusion relations between the lexical attribute sets of each pair of concepts in each non-lattice subgraph are compared to existing IS-A relations in SNOMED CT. For concept pairs within each non-lattice subgraph, if a subset relation is identified but an IS-A relation is not present in SNOMED CT IS-A transitive closure, then a missing IS-A relation is reported. The September 2017 release of SNOMED CT (US edition) was used in this investigation. RESULTS A total of 14,380 non-lattice subgraphs were extracted, from which we suggested a total of 41,357 missing IS-A relations. For evaluation purposes, 200 non-lattice subgraphs were randomly selected from 996 smaller subgraphs (of size 4, 5, or 6) within the "Clinical Finding" and "Procedure" sub-hierarchies. Two domain experts confirmed 185 (among 223) suggested missing IS-A relations, a precision of 82.96%. CONCLUSIONS Our results demonstrate that analyzing the lexical features of concepts in non-lattice subgraphs is an effective approach for auditing SNOMED CT.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT

Objective Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of missing hierarchical relations and concepts in SNOMED CT. Material and Methods All non-lattice subgraphs (the structural part) in SNOMED CT are exhaustively extracte...

متن کامل

Title: Detecting Misaligned and Missing Concepts in SNOMED CT using Structural and Lexical Patterns

Objective: Quality assurance of large ontological systems such as SNOMED CT is an indispensable part of the terminology management lifecycle. We introduce a hybrid structural-lexical method for scalable and systematic discovery of novel anomalies in SNOMED CT. The structural component is based on shared isa relations to other concepts. The lexical component leverages shared words in description...

متن کامل

Identifying Missing Hierarchical Relations in SNOMED CT from Logical Definitions Based on the Lexical Features of Concept Names

Objectives. To identify missing hierarchical relations in SNOMED CT from logical definitions based on the lexical features of concept names. Methods. We first create logical definitions from the lexical features of concept names, which we represent in OWL EL. We infer hierarchical (subClassOf) relations among these concepts using the ELK reasoner. Finally, we compare the hierarchy obtained from...

متن کامل

Large-Scale, Exhaustive Lattice-Based Structural Auditing of SNOMED CT

One criterion for the well-formedness of ontologies is that their hierarchical structure forms a lattice. Formal Concept Analysis (FCA) has been used as a technique for assessing the quality of ontologies, but is not scalable to large ontologies such as SNOMED CT (> 300k concepts). We developed a methodology called Lattice-based Structural Auditing (LaSA), for auditing biomedical ontologies, im...

متن کامل

Identifying Potentially Missing Hierarchical Relations in SNOMED CT based on Lexical Features - Impact of Synonyms and Lexico-syntactic Constraints

Introduction The quality assurance of large bio-ontologies is extremely critical for their effective and continued use and is an active area of research1. For example, recent investigations highlighted issues in the hierarchical structure of SNOMED CT and its detrimental effects on biomedical applications2. Previous work by one of the authors3 established a method to identify potentially missin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of biomedical informatics

دوره 78  شماره 

صفحات  -

تاریخ انتشار 2018